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Abstract-- We present an efficient framework rorsynthesizine 
look-up table (LUT) networks. 

Some of the existing LUT network synthesis methods are based 
afunctional (boolean) decompositions. Our method also uses func- 
tional decompositions, but we try to use various decomposition 
methods, which include algebraic decompositions. Therefore, this 
method can be thought of as a general framework for synthesiz- 
ing LUT networks by integrating various decomposition methods. 
We use a cost database file which is a unique characteristic in our 



present comparisons between our method and some 
well-known LUT network synthesis methods, and evaluate the 
final results after placement and routing. Although our method 
is rather heuristic in nature, the experimental results are encour- 



l. Introduction 

When implementing a combinational logic function using 
a given technology, the desired function must be decomposed 
or factorized to smaller functions so that the decomposed 
functions can fit onto the implementation primitives of the 
technology. Accordingly, many decomposition methods have 
been proposed. Most of these methods are based on transform- 
ing the algebraic expressions of switching formulas, which we 
call them algebraic decomposition methods. For example, ker- 
nel extraction [1] is an example of one superior method. Such 
decomposition methods appear to be reasonable in conjunction 
with the technology mapping phase for standard technology 
libraries. 

To realize combinational logic functions using a lookup table 
(LUT) based field programmable gate array (FPGA), we must 
generate an LUT network where each LUT is a special node 
that can realize any function with K (typically 4 or 5) inputs. 
Most LUT network synthesis methods can be divided into the 
following two categories. 

The methods in the first category are extended methods for 
the standard technology libraries: 



» First, a logic optimizer perfor 



, _ ._ 0 „ t~"y""a u«-ompositioh and 

technology-independent optimization. In this phase, al- 
gebraic decomposition methods are usually used, and the 
number of literals is used for the cost considering the 
standard technology libraries. 



* JJj^ a te <*no!ogy mapper covers nodes to if-input 

C^IT^?^™ ^ methods such as 

Chortle-d [2], MIS-pga-delay [3]' and FlowMap 141 For 
U n C /°Z™f ^.optimal algorithms have been developed 
ntlrTZf P. 4J. However, the intermediate 

networks before the covering phase often affect the final results- 
m such cases, the final results are sometimes not so good. 

meth ^ ds "» the second category consist of only one 
phase, they directly transform primary output functions (not 
ex pre ss,ons) ,n terms of primary inputs represented by an 
ordered binary decision diagram (OBDD) [5, 61 (Below 
we call transformations of functions functional decomposition 
methods.) Therefore, the final results are not affected by 

-"H^ " ,d USUal,y b€tter than results of 
the methods in the first category. 

m ™ e decom P os,tion f °™ of the functional decomposition 

to S£S2 r m * C SeCO " d '^-ited 

to a specified form based on Disjoint Decomposition 171 In 
some cases, another type of functional decomposition called 
Non-DKjoint B,-Decomposition may be more appropriate [81 
However, there is no method that positively triesto utilize var/. 
Bf r ^^ 0naldeCOm P OS ^ on m ^ods including Non-Disjoint 
Bi-Decomppsition m synthesizing LUT networks 

withT^ 6 * 6 ^ discussio 'V we propose a method 
with the following properties: 

- Various decomposition methods, not only algebraic but 
Afunctional decomposition methods, can be integrated. 

• It consists of only a decomposition phase. That is we 
do not need to consider the covering efTect after the 
decomposition phase. 

• To select the "best probable" decomposition at an interme- 
diate decomposition, a "cost database file" is introduced. 

Various decomposer, methods, such as Disjoint Decomposi- 
tion [7], Non-Dtejo,„t B.-Decomposition [8], Weak Division 
y ^T« u 3V '° ExpanS, ° n «« be integrated into our 
memol * th ° U&ht ° faS an ^ion of the 

methods in the second ca tegory and a general framework for 

'M.S-pga panially u „ a functional dccomposmon mohod. 
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synthesizing LUT networks by integrating various decompo- 
sition methods. Although it is rather heuristic in nature, the 
experimental results are very encouraging. 

This paper is organized as follows. In Section II, we 
briefly explain decomposition methods which are used in our 
method In Section HI, we propose a method of using various 
decomposition methods to synthesize LUT networks. We 
present experimental results in Section V. We mention the 
features of our method in Section VI. Section VII concludes 
this paper. 

II. Preliminary 

We treat a network as a directed acyclic graph (DAG) where 
each node has a specified internal function with respect to its 
fanins. If the number of fanins of a node is not more than K, 
we call the node iC-feasible. 

Our problem is to generate the lowest cost network where 
all nodes are Jf -feasible. 

The cost of a decomposed network is defined as follows: 
(the number of nodes in the network) + (W x the levels of the 
network), where W is the user defined weight. 

We consider various decomposition methods to be incorpo- 
rated into our method. Here we summarize some of them as 
follows. 



Disjoi 



The disjoint decomposition is the form. / - 

oM**),... = a ^ x 2iK ) ' where r 

and X F are disjoint variable sets [7]. This decomposition 
can be found by using the OBDD representing the function 
of a node to be decomposed [6, 9). In the previous LUT 
network synthesis method, \X B \ is limited to K so that each 
o(X B ) can be mapped into a single LUT. In our method, we 
prepare this kind of method with \X B \ as 3 up to AT because 
we also consider the covering effect at the same time when 
decomposing a node, which will be discussed later. 

Non-Disjoint Bi-Decomposition 

The decomposition form: / = a( 9 ,{X { ),<n(X 2 )), where X' 
and X 2 are not limited to disjoint variable sets, can be ef- 
fectively found by the method proposed in [8]. For some 
functions, this decomposition form is better than Disjoint De- 
composition [8]. The method can treat an incompletely spec- 
ified function for /, and represents g, and 02 as incompletely 
. specified functions, which is an advantage of th.s method 
However if we want to use this decomposition together with 
Disjoint Decomposition, we need to consider covering nodes 
at the same time as will be mentioned in Section III because 
this decomposition produces a two-input node as a root node 
for a decomposition, which is very different from the case of 
Disjoint Decomposition. This is one of our motivations to 
propose the framework in this paper. 



Weak Division by Kernels 

This is a decomposition method using sum-of-products ex- 
pressions [1]. The computation time is usually smaller than 
functional decomposition methods since it is based on the 
algebraic division of expressions. 

Davio Expansion 

The Davio Expansion has the following three forms: 

• / = fo®Xi- h, 

• / = / l ©x7-/ 2 and 

• / =2</o®s<-/i> 

where /„ = fwr. /■ = /«. 311(1 h=h®h- This decomposi- 
tion is important because any function can be decomposed by 
using these expansions. 

III. Our LUT Network Synthesis Method 
A. Concept of Our Method 

Our strategy is based on the following concept. Suppose 
we have various decomposition methods. We can find the best 
decomposed network from the search space by considering all 
of the possible combinations of the decomposition methods 
and the covering effect. However, performing an exhaustive 
search for all of the possible combinations is not practical. 
Therefore, we instead select a "best probable" decomposition 
at an intermediate decomposition. 

If we must think of the covering effect after the decomposi- 
tion phase, it becomes difficult to determine a "best probable" 
decomposition at each intermediate decomposition, because 
the decomposition forms are likely to be different between 
some of the decomposition methods. Thus, it is difficult to 
predict the covering effect when the decomposition is being 

• With this in mind, we evaluate the "cost" of a decomposition 
form with the following strategy. 

• We evaluate the cost of a decomposition including the 
covering effect at the same time. 

» We predict the cost of nodes whose supports are more than 
K by using a "cost database file " which describes de- 
composition costs of functions from previously designed 



B. Outline of Our Method 

The overall procedure of generating a network whose nodes 
are all /f-feasible is as follows. 

Step 1 : Construct an initial network that has only primary 
output nodes whose internal logics correspond to the 
primary output functions (in terms of primary inputs) of 
the given specification. 
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Step 2: As long as there remains a node that is not /("-feasible, 
we decompose the node by using a selected decomposition 
method. How to select a "best probable" decomposition 
is mentioned in Section HI-C. 

For a node to be decomposed, we prepare both sum-of-products 
expressions and functions by OBDDs for the internal logic in 

order to ntili?** KntK oU B k».v — j r. ■ ■ 



Fig. I. Decomposition fonn of a node. 



C. How to Select a "Best Probable" Decomposition 

We characterize a decomposition form of the various de 

composition methods used in our method as follows: a decom- , va D y 

position form of a node n is characterized as a node n', which debase file as mentioned in Section 1H-D 
is a replacement of n, and newly introduced nodes n, , • - - 
which are fanins of n'. We can treat most decomposition 
methods in this form. Fig. 1(b) shows an example of this for 

B.-Decomposition based methods. We call the set of nodes a , ~ . ... , « « ua[a 

introduced at the decomposition "DecompArea" (the dotted file which SUms the statistical relationships between some 

rectangle «. Fig. 1(b)). parameters characterizing the output function (in terrnsTf 

In our decomposition form, we do not share common func- P nma ry inputs) of a node m, and CostLUT(n { ) and 

tions between some functions. This is because we bravely CostLEVfa). In the present implementation, we use the 

lt Jr^' nE ^ 1, ^ ti ° I1S ™ 0tder to , u . nifonnly treat vanous de- "™™ of supports of the function, and the number of cubes 



^l eaS i b '^ e \ However - w know the precise values 

of CostLUT^ and CostLEV {ni ) if „,- is no P , K _ feasib te 
Therefore, we determine their values by looking up a cost 



?. Decomposition Cost 
In our cost strategy, at first we prepare a cost database 



composition methods. However, this can be considered as an and literals in 311 expression for the function. 



extension of ourmethod and will be mentioned in Section IV-A. 

We select a "best probable" decomposition form of a node 
at Step 2 in our method by evaluating the "cost" of the 
decomposition. Since we want to treat various decomposition 
methods, we consider the case where the number of fanins of 
a node in the DecompArea is less than K. For example, the 
number of fanins of n' is two when a decomposition method 
based on Bi-Decomposition is used. Such a node may be 
merged into a node not in the DecompArea. Since our strategy 
does not perform the covering phase after the decomposition 
phase, we try to merge such a node, which is at the boundary of 
the DecompArea, into a node not in the DecompArea to form 
a newly merged node if the merged node is still .fir-feasible 
as shown in Fig. 1(c). In this example, n' and n 2 can be 
merged into other nodes, so we do not consider them in the 
decomposition cost. Accordingly, the cost evaluation after 
the merging of the nodes simultaneously includes the coverinp- 
effect. 

For the DecompArea after the merging (the dotted rectangle 
in Fig. 1 (c)), our cost is defined as: 
cost of o decomposition = 

{ £ CostLUT{rM)}+ 

nieDecompArca 

where W is the user defined weight. LEV( ni ) is recursively 
defined as follows, and it becomes 0 for a. primary input node. 



We generated a cost database file 
the only method. 



as shown here, but clearly 



• We make a first cost database file in which CostLUT(n 1 
and CostLEV( ni ) take the same value as follows. 



l.ifn, is A"-feasible 
(fTie number of fani 



of n.) - K+X, otherwise 



This value is taken from [5]. We do not consider the 
number of cubes and literals in this first cost database file. 
• Using the first cost database file, we generate various 
networks by our method. We then make a second cost 
database file in which each entry describes a statistical 
relationship between the above three parameters for the 
output function of each node in the decomposed networks 
and the number of transitive fanins of the node and 
tevels of the node, which correspond to CostLUT and 
CostLEV, respectively. 



literals || CostLUT 



□ 



LEV(n i ) = { 



_ LEV(nj)} +CostLEV(m). 



CostLUT{ ni ) and CostLEV^) denote the predicted num- 
bers of A'-LUTs and the levels for implementing the in- 
ternal function of n„ respectively. They become 1 for a 



For example, .f we need 7 LUTs and 3 levels to implement 
a function whose supports, and cubes and literals are 7, 24 
and 113, respectively, by using the first cost database file (it 
actually happened in our experiments), we get an entry in the 
second cost database file as shown in Table I. We think the 
second cost database file is more accurate than the first cost 
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database file because the latter predicts that both CostLUT 
and CostLEV for the function are 3 (K = 5). winch is quite 
different form the actual results. The second cost database file 
can be thought of as a feedback from the previously designed 
results. Actually, we usually obtain better results with the 
second cost database file than we do by using the first cost 
database file. 

With the cost database file, we calculate CostLUT {m) and 
CostLEV(ni) as follows. 

• Calculate three parameters from the internal function of 

. Find the values of CostLUT (m) and CostLEV(m) in 
the entry thai produces the best fit for the three parameters 
in the cost database file. 

In most of the previous logic synthesis methods, the cost of a 
function is usually measured only by the number of supports of 
the function or literals in the logic expression of the function. 
We can use the both parameters in our method. 

We can generate the third cost database file from the second 
cost database file in the same way. With the third cost database 
file, we sometimes obtain better results than we do by using 
the second cost database file. 



IV. an Extension of Our Method 

A. Sharing Sub Functions 

As previously mentioned, our strategy takes little account of 
the sharing of common functions, which sometimes dramat- 
ically reduces the network cost. Therefore, we plan to add 
the following operation to the decomposition methods that are 
also used as decomposition methods at Step 2: When a node 
n ; is decomposed, we check whether an existing node n_, can 
be used for the node. This can be accomplished by dividing 
the expression of hi by the expression of n„ which is called 
algebraic resubstitution. This can also be done by utilizing the 
boolean resubstitution and the support minimization technique 
proposed in [6]. Note that we can adopt the above operahon 
as a decomposition method in our framework if we do not 
consider nj in the decomposition cost. 

In our framework, we can also prepare another operation 
to share common functions: after all decompositions, the 
minimization method proposed in [1 1] is performed to replace 
the output of a node with that of another node. 

B. Speeding Up the Framework 

We believe that some decomposition methods had better be 
applied first if possible. For example, a simple disjunctive 
decomposition usually provides good decomposition forms 
that can be found relatively fast [10]. Such decomposition 
methods should be applied before the decomposition of a node 
at Step 2. We expect that this process will sometimes reduce 
the total computation time. 

Another technique of speeding up the framework is to 
independently checking each decomposition method at Step 2. 



Indeed, we can perform decomposition methods in parallel on 
different processors, and this reduces the computation time. 
The idea is as follows. If a decomposition method takes a much 
longer time to decompose some functions (or expressions) 
than other decomposition methods, the decomposition result is 
usually worse. Therefore, we abandon some of decomposition 
methods that take too much time in a parallel implementation 
without sacrificing the quality of our results. This dramatically 
reduces the computation time. As processors are getting 
cheaper and cheaper, an implementation in parallel becomes 
more attractive for our method. 



v. Experimental Results 

A. Evaluation of Various Implementations 

We can get various results from various implementations of 
our method; for example, the implementation varies depending 
on the types of decomposition methods that are integrated and 
the cost database file that is used. All of our results shown in 
this section were produced by an implementation using Davio 
Expansion with each variable, Disjoint Decomposition with 
\X B \ as 3, 4 and 5, and Non-Disjoint Bi-Decomposition. 

In our cost strategy, we originally expect the followings 
features: 

• The results obtained with the (n + l)-th cost database file 
are usually better than those with the n-th cost database 
file. 

• We can control the trade-off between network levels and 
the number of nodes by using the user defined weight W. 

We performed experiments with the first, second and third 
cost database files and W = 0.5, 1 and 2.0. From a comparison 
of the results, we could not find the above features but instead 
the following features: 

• The results with the second cost database file and W = 1 
are usually the best. This means that we cannot expect 
the third cost database file to always be better than the 
second cost database file. 

• If W is larger, the levels usually becomes smaller. How- 
ever, changing W seemed to have no effect on the number 
of nodes. 

From the above, we do not consider our current cost database 
files to be robust. However, the differences between the various 
cost database files and W were not so large, and all results were 
thought to be good enough, as we will see in the following 
sub-sections. 

B. Comparison Before Placement and Routing 

Table II compares the mapping results for 5-LUT networks 
between our method and several of the well-known level- 
optimized LUT network synthesis methods. Our results were 
obtained with the second cost database file and W = I 

The sub-columns "ttlut" and "Jlvl" show the numbers of 
5-LUTs and network levels, respectively. The sub-column 



BEST AVAILABLE COPY 



TABLE II 

Comparison of Mapping Results for 5-LUT Networks 




"CPU" indicates the CPU run-time (sec.) on a Sun Ultra 2 
2200. To compare our results with other results, we show the 
total numbers for the same circuits in the lower part of the table. 
The shaded numbers indicate the best results. Our framework 
appears relatively good in the comparison. We think one reason 
is that Non-Disjoint Bi-Decomposition sometimes provides 
good decompositions. Our method sometimes needed a long 
computation time, which we do not think is a very serious 
d in Section IV-B. 



C. Comparison After Placement and Routing 

We have incorporated the proposed method into 
PARTHENON [13], which consists of a simulator and syn- 
thesizers for a hardware description language SFL (Structured 
Function description Language). 

To evaluate the integrated system, we compared the follow- 
ing two logic synthesis flows. 

Using the mapping method in Max+plus II 

Step 1 Convert the file format and perform the logic syn- 
thesis at the technology independent level (inclu<"" 



logic reduction) by PARTHENON, and output the 

result to MAX+plus II, which is the development 

system for Altera devices. 
Step 2 Perform the technology mapping for the Altera 

FLEX8000 series [14J by MAX+plus II. 
Step 3 Perform placement and routing by MAX+plus II. 
Using our mapping method 



Step 1 Convert the file format by PARTHENON. 
Step 2 Our method is called from PARTHENON sys- 
tem to perform the technology mapping for 4-LUT 
networks 2 . Then PARTHENON outputs the result 
with the mapping information 3 to MAX+plus II. 
Step 3 Perform placement and routing by MAX+plus II. 
The two flows are different depending on the method used to 
generate LUT networks, our proposed method or MAX+plus II 

«*r T ^ Ie rESUltS ^ Pk^ent and routing.' 

"tfLE and Delay" show the numbers of logic elements and 
the delay values (ns) for the longest paths in the final results 
respectively. From the table, we can see that our method 
also has a good effect on the final results after placement and 
routing. 

VI. FEATURES OF OUR METHOD 

The proposed method has the following features. 

. Various decomposition methods can easily be integrated 
into our method. If a new decomposition algorithm has 
been developed, we can easily check its effectiveness in 
our framework. 

• We can get various results from various implementations 
of our method. Therefore, we are able to obtain various 



*A logic clement of FLEX8OO0 has one 4-input, I -output LUT. 
An LCELL primitive in MAX+plus II can be used to attach the m 
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decomposed networks for a given specification, and can 
explore a large design space. 

There are some interesting features in our cost strategy. It is 
natural for a "bad" entry (which we think has a bad effect on 
our cost strategy) to be generated in our cost database file from 
a "bad" node for which abnormal (unexpected) numbers) 
of nodes or (and) levels were used in previously designed 
networks. In our experiment, we ignored a "bad" entry in the 
cost database file. However, it was interesting that when we 
resynthesized a "bad" node, the numbers of nodes and levels 
for the node were reduced at times to normal values in our cost 
database file. We think this feedback to the resynthesis is one 
of the advantages of our framework. In other words, the cost 
of the network was sometimes reduced by resynthesizing the 
output of "bad" nodes. 



VII. Conclusion 

We have proposed an efficient method for synthesizing LUT 
networks. In our method, we successfully integrated many 
decomposition methods that are not only algebraic but also 
functional. Our method can be thought of as a general frame- 
work for synthesizing LUT networks by integrating various 
decomposition methods. 

Currently, our framework cannot treat large networks be- 
cause some of functional decomposition methods cannot treat 
large functions. In the future, therefore, we would like to 
improve the framework by incorporating it with appropri- 
ate network partitioning methods, and to extend it using the 
techniques presented in this paper. 
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